{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "921c7347",
   "metadata": {},
   "source": [
    "- old api version\n",
    "\n",
    "    ```\n",
    "    pip install openai==0.28\n",
    "    ```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a44098f",
   "metadata": {},
   "source": [
    "## models list"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48cd0895",
   "metadata": {},
   "source": [
    "```\n",
    "import openai\n",
    "model_list = openai.Model.list()\n",
    "\n",
    "[model['id'] for model in openai.Model.list()['data']]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e340014",
   "metadata": {},
   "source": [
    "- gpt-4\n",
    "    - gpt-4\n",
    "    - gpt-4-vision-preview: gpt-4v\n",
    "    - gpt-4-1106-preview: gpt-4-turbo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57472b10",
   "metadata": {},
   "source": [
    "## api"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3e84342",
   "metadata": {},
   "source": [
    "- 2023年11月的策略（当前7月就已更新），add payment $5，就会在api支持的models里增加 gpt-4 api\n",
    "- 要求其格式化输出（format response），比如 json\n",
    "\n",
    "```\n",
    "At the end of your response, please provide the following information in JSON format:\n",
    "{\n",
    "    \"key1\": \"<your_intention_here>\",\n",
    "    \"key2\": \"<your_action_here>\"\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdb569c0-5bd9-408d-9e4a-3496ae9d6dc3",
   "metadata": {},
   "source": [
    "### 版本 `>= 1.0.0`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "eca0e27f-d70e-469c-9925-70d2292ae39b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install --upgrade openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "14983a7a-289d-4337-9507-537b60c19332",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'1.17.1'"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import openai\n",
    "from openai import OpenAI\n",
    "openai.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7d5e3b4f-f317-4b8c-ba4a-770c76d8a3f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# client = OpenAI(api_key='sk-xx')\n",
    "# client.models.list()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d44f5c1d-f753-4651-885a-6eb61fb1c28b",
   "metadata": {},
   "source": [
    "### response"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac055e2c-2a37-4ac8-a92d-4b1f3f36e88f",
   "metadata": {},
   "source": [
    "- finish reason: `response.choices[0].finish_reason`\n",
    "    - https://platform.openai.com/docs/guides/text-generation/chat-completions-api"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b3507e3-8d95-487f-a6a9-d1968d0a1854",
   "metadata": {},
   "source": [
    "## OpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc63cfe5-b632-42b4-b3c3-03be265a1528",
   "metadata": {},
   "source": [
    "- 不只支持 openai 的 api"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9f172143-86f8-4bb1-bdd0-50a616a7a592",
   "metadata": {},
   "outputs": [],
   "source": [
    "from openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2f9d156d-9470-43fe-a042-0a5bbe5684b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# deepseek\n",
    "# client = OpenAI(api_key=\"sk-xx\", base_url=\"https://api.deepseek.com/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af482a22",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "## encoding & `tiktoken`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36c1a35d",
   "metadata": {},
   "source": [
    "- https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb\n",
    "- prevent EOS: `'<|endoftext|>'`\n",
    "    - \"text-davinci-003\"：`logit_bias={\"50256\": -100}` \n",
    "        - `Completion` API \n",
    "        - `tiktoken.get_encoding('gpt2')`\n",
    "    - \"gpt-4\"：`logit_bias={\"100257\": -100}`\n",
    "        - `ChatCompletion`\n",
    "        - `tiktoken.get_encoding('cl100k_base')`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0b3ea771",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:15:31.719507Z",
     "start_time": "2023-10-25T15:15:31.687451Z"
    }
   },
   "outputs": [],
   "source": [
    "import tiktoken"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "51700c52",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:30:10.201487Z",
     "start_time": "2023-10-25T15:30:10.191495Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['gpt2', 'r50k_base', 'p50k_base', 'p50k_edit', 'cl100k_base']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tiktoken.list_encoding_names()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6d1d3711",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:35:15.729857Z",
     "start_time": "2023-10-25T15:35:15.722051Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Encoding 'p50k_base'>\n",
      "<Encoding 'p50k_base'>\n",
      "<Encoding 'cl100k_base'>\n"
     ]
    }
   ],
   "source": [
    "for m in ['text-davinci-002', 'text-davinci-003', 'gpt-3.5-turbo']:\n",
    "    print(tiktoken.encoding_for_model(m))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c7a74666",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:30:17.219123Z",
     "start_time": "2023-10-25T15:30:17.211862Z"
    }
   },
   "outputs": [],
   "source": [
    "tokenizer = tiktoken.get_encoding('cl100k_base')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8cca5906",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:16:45.382754Z",
     "start_time": "2023-10-25T15:16:45.371568Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100277"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokenizer.n_vocab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a6655f37",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:17:09.621449Z",
     "start_time": "2023-10-25T15:17:09.611146Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'<|endofprompt|>',\n",
       " '<|endoftext|>',\n",
       " '<|fim_middle|>',\n",
       " '<|fim_prefix|>',\n",
       " '<|fim_suffix|>'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokenizer.special_tokens_set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "d87d5f85",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:36:06.916897Z",
     "start_time": "2023-10-25T15:36:06.907210Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100257"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# EOT: end of text\n",
    "tokenizer.eot_token"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "9d5cdb37",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:19:16.421790Z",
     "start_time": "2023-10-25T15:19:16.408691Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[100257]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokenizer.encode('<|endoftext|>', allowed_special={'<|endoftext|>'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "c8f75fa4",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:19:53.727813Z",
     "start_time": "2023-10-25T15:19:53.717688Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[100257]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokenizer.encode('<|endoftext|>', allowed_special=tokenizer.special_tokens_set)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "9d3bac5b",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-10-25T15:31:07.637929Z",
     "start_time": "2023-10-25T15:31:07.219660Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[50256]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tiktoken.get_encoding('gpt2').encode('<|endoftext|>', allowed_special=tokenizer.special_tokens_set)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
